-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[QNN EP] Skip inputs/outputs shape validation for QNN Batch Multiple #26336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[QNN EP] Skip inputs/outputs shape validation for QNN Batch Multiple #26336
Conversation
Description * QNN EP supports batch multiplier during inference, while InferenceSession::checkShapes validates the input_output_shape against the expected_shape. * This check is relaxed when all nodes are assigned to QNN EP and running batch size is divisible by the original batch size. * A separate PR will be submitted for the implementation of batch multiplier support in QNN EP. Motivation and Context * This change supports batch multiplier in QNN API to ORT as described in this page: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-10/function_QnnGraph_8h_1a3ea05f42a9295f9a74a2e3a0cdd64228.html
|
@microsoft-github-policy-service agree company="Qualcomm" |
| } else if (i == 0 && is_qnn_batch_multiplier_valid(input_output_shape[i], expected_shape[i], model_->MainGraph())) { | ||
| continue; // Qnn API supports batch multiplier, but the running batch size must be divisible by the original batch size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to add QNN EP-specific relaxing of shape validation here. if the graph was not running on the QNN EP, would it be considered invalid? can you elaborate on what you are trying to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for the question.
The shape only be considered valid under QNN EP due to its support for batch multiplier.
If the graph is not assigned to QNN EP, the original shape check logic applies, and the shape would be considered invalid.
QNN EP supports batch multiplier, which allow the model to be compiled with smaller batch size (e.g. 2), and then run inference with a larger batch size (e.g. 128), as long as the inference batch size is divisible by the compile batch size.
This can help reduce compile time (session creation time), while still supporting larger inference batches.
What we're trying to do is ensure that this flexibility is preserved during shape validation when the graph is intended to run on QNN EP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shape only be considered valid under QNN EP due to its support for batch multiplier.
If the graph is not assigned to QNN EP, the original shape check logic applies, and the shape would be considered invalid.
thanks for clarifying.
I still think that we don't want to add QNN EP-specific handling here. looking ahead, consider that for plugin EPs, it would be even less desirable to have similar hardcoded logic at this point.
QNN EP supports batch multiplier, which allow the model to be compiled with smaller batch size (e.g. 2), and then run inference with a larger batch size (e.g. 128), as long as the inference batch size is divisible by the compile batch size.
is it possible for the QNN EP to manage this optimization internally? e.g., identify that a smaller batch size than the one in the actual shape can be used for compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @edgechen1, here is more detailed explanation about this PR.
When using onnxruntime_perf_test.exe, onnx_test_runner.exe, or any application that calls InferenceSession, it will use InferenceSession::Initiatization() and InferenceSession::Run().
Current Flow
- InferenceSession::Initiatization()
- Uses the source onnx model's input shape as the expected dimension. If a dimension in the input shape is dynamic, it is labeled as -1. Otherwise, it is a positive integer.
- InferenceSession::Run()
- Checks the input data shape.
- If a dimension in the input shape is dynamic, it ignores the check.
- Otherwise, the dimension must match.
- After checking, it calls the QNN function to execute the graph.
- Checks the input data shape.
Our Usecase
- InferenceSession::Initiatization()
- The source onnx model's input shape is used, and we assign the batch dimension with a positive integer as base batch (ex: 2)
- InferenceSession::Run()
- Checks the input data shape.
- If a dimension in the input shape is dynamic, it ignores the check. → Won't happen in our usecase
- Otherwise, the dimension must match. → This is what we don't want, because QNN supports "batch multiplier", which allows the running input batch to be divisible by the assigned batch number. (ex: 4, 6, 8)
After checking, it calls the QNN function to execute the graph.
- Checks the input data shape.
Reason of Change
- As in our Usecase 2.1.2, this check prevents us from enabling the batch multiplier, which would allow us to improve context preparation efficiency. For example, we could compile the model with batch=2 and support different input batch sizes, rather than being limited to a fixed batch size.
- If we do not relax this check, there is no alternative to adjust it on the QNN side since this check occurs at the very first phase of inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, the dimension must match. → This is what we don't want, because QNN supports "batch multiplier", which allows the running input batch to be divisible by the assigned batch number. (ex: 4, 6, 8)
I'm not convinced that what you're proposing is the behavior that we (ORT) would want. My understanding is that the input dimension at runtime should match what is specified in the static shape. If one wants to set different values for an input dimension, then it should be a dynamic dimension.
Could the QNN EP support this by allowing the "batch multiplier" dimension to be dynamic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the QNN EP support this by allowing the "batch multiplier" dimension to be dynamic?
For QNN-EP, dynamic dimension support has not been implemented yet.
https://github.com/microsoft/onnxruntime/blob/615c22bf6ba5d259c49484416d0a87a91d936e13/onnxruntime/…
The "batch multiplier" helps save preparation time by compiling the smallest batch, allowing different batch sizes to run without recompiling. However, even we enable dynamic dimensions in QNN-EP, it would not help reduce preparation time in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our Usecase
InferenceSession::Initiatization()
- The source onnx model's input shape is used, and we assign the batch dimension with a positive integer as base batch (ex: 2)
InferenceSession::Run()
Checks the input data shape.
- If a dimension in the input shape is dynamic, it ignores the check. → Won't happen in our usecase
- Otherwise, the dimension must match. → This is what we don't want, because QNN supports "batch multiplier", which allows the running input batch to be divisible by the assigned batch number. (ex: 4, 6, 8)
After checking, it calls the QNN function to execute the graph.
A shape can be dynamic or fixed. If it's fixed, the data must match. Creating a new "dimension must be a multiple of this value" state would invalidate a lot of assumptions in the code and it is not part of the ONNX spec.
Having an EP specific "trust us it's okay" piece of code in ORT core doesn't really work either. What happens if another EP wants a slightly different new state? Do we accumulate a bunch of special cases that potentially conflict with each other?
Can you set some other metadata in the model to indicate the smallest batch size? e.g. put entries in the model metadata like "batch_multiplier":"4" and "batch_multiplier_dim_name":"N" if the symbolic dimension name for the batch size is 'N'. that lets you specify the value and the dim_name it applies to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @skottmckay @edgchen1 ,
In the batch multiplier scenario (where the compile-time batch size may differ from the runtime batch size), it appears that only ONNX models with dynamic batch size can pass the checkShape validation. However, since QNN EP does not support dynamic shapes, we require a static ONNX model inside the EP.
Would it be feasible to apply a free dimension override to ONNX Graph within QNN EP to convert the dynamic shape of the ONNX Graph into a static shape, and then perform shape inference to ensure that the output shape is also fully static?
Given that GraphViewer or Graph cannot be constructed inside QNN EP, we cannot replicate the ONNX graph internally within the EP.
* Remove dedundant code, and use existing function * Use macro wrap around these codes to ensure all when building QNN would trigger the extra check
|
Hi @yuslepukhin , this PR will impact our developing feature, it would be great if you can help review and share the thought on this PR, thanks! |
1260d3c to
2e429e1
Compare
* add SessionOptions for qnn htp batch multiplier * avoid validating inputs/outpus if this option is used
2e429e1 to
7ccc250
Compare
|
Hi @edgchen1, @yuslepukhin @skottmckay, |
|
|
||
| #ifdef USE_QNN | ||
| const bool batch_multiplier = session_options_.config_options.GetConfigOrDefault(kOrtSessionOptionsQnnHtpBatchMultiplier, "0") == "1"; | ||
| if (!batch_multiplier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disabling input/output validation entirely seems a bit drastic. while it would allow your use case, it would also let other invalid cases through and potentially lead to runtime errors that are harder to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @edgchen1. We should not disable the input/output validation entirely.
@qti-chuteng,
We should still validate the input and output shapes with batch_dim at run is integral multiplier of original dimension while other dimensions should match the original shape as shown in below example:
original shape: [2, 224, 224, 3]
input shape at run inference: [10, 224, 224, 3]
where, batch(run_input_shape) % batch(orig_input_shape) == 0 and other dim values should match between both orig_input_shape and run_input_shape.
Description
Motivation and Context
@microsoft-github-policy-service agree company="Qualcomm"